Skip to content

[SC-7864] Create credit risk scorecard notebook using XGBoost#280

Merged
juanmleng merged 14 commits intomainfrom
juan5508/sc-7864/create-credit-risk-scorecard-notebook-using-xgboost
Jan 3, 2025
Merged

[SC-7864] Create credit risk scorecard notebook using XGBoost#280
juanmleng merged 14 commits intomainfrom
juan5508/sc-7864/create-credit-risk-scorecard-notebook-using-xgboost

Conversation

@juanmleng
Copy link
Contributor

@juanmleng juanmleng commented Jan 2, 2025

Internal Notes for Reviewers

Add new application scorecard notebooks using ML with additional testing:

  • application_scorecard_with_ml.ipynb: running individual tests
  • application_scorecard_full_suite: using run_documentation_tests()

External Release Notes

Add new application scorecard notebooks using ML with additional testing:

  • application_scorecard_with_ml.ipynb: running individual tests
  • application_scorecard_full_suite: using run_documentation_tests()

@juanmleng juanmleng added internal Not to be externalized in the release notes DO NOT MERGE PR is not ready to be merged labels Jan 2, 2025
@juanmleng juanmleng self-assigned this Jan 2, 2025
@juanmleng juanmleng removed the DO NOT MERGE PR is not ready to be merged label Jan 3, 2025
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2025

PR Summary

This pull request introduces several enhancements and bug fixes to the ValidMind Library, particularly focusing on credit risk scorecard modeling. The key changes include:

  1. New Notebooks: Two new Jupyter notebooks have been added to demonstrate the application scorecard model using the ValidMind Library. These notebooks provide a step-by-step guide for loading a demo dataset, preprocessing data, training models, and documenting the model using ValidMind.

  2. New Tests: Several new tests have been added to the validmind/tests directory, including:

    • MutualInformation: Evaluates feature relevance by calculating mutual information scores between features and the target variable.
    • ScoreBandDefaultRates: Analyzes default rates and population distribution across credit score bands.
    • CalibrationCurve: Assesses the calibration of probability estimates by comparing predicted probabilities against observed frequencies.
    • ClassifierThresholdOptimization: Analyzes and visualizes different threshold optimization methods for binary classification models.
    • ModelParameters: Extracts and displays model parameters for transparency and reproducibility.
    • ScoreProbabilityAlignment: Evaluates the alignment between credit scores and predicted probabilities.
  3. Enhancements to Existing Tests: Modifications have been made to existing tests to improve their functionality and accuracy. For example, the TooManyZeroValues test now includes a row count and uses a percentage threshold for zero values.

  4. Dataset Splitting Functionality: The split function in lending_club.py has been enhanced to support an optional validation set, allowing for more flexible dataset splitting.

  5. Test Configuration Utility: A new utility function get_demo_test_config has been added to generate a default test configuration for demo purposes.

  6. Version Update: The version of the ValidMind Library has been updated from 2.7.3 to 2.7.4.

  7. Bug Fixes: Various bug fixes have been implemented, including corrections to test logic and improvements to test coverage.

Test Suggestions

  • Run the new Jupyter notebooks to ensure they execute without errors and produce the expected outputs.
  • Verify the functionality of the new tests by running them with different datasets and configurations.
  • Test the enhanced split function with various dataset sizes and configurations to ensure it correctly handles train, validation, and test splits.
  • Check the accuracy and performance of the MutualInformation and ScoreBandDefaultRates tests with known datasets.
  • Validate the CalibrationCurve and ClassifierThresholdOptimization tests by comparing their outputs with expected calibration and threshold optimization results.
  • Ensure the ModelParameters test correctly extracts parameters from different model types.
  • Test the ScoreProbabilityAlignment test with datasets having different score distributions.

@juanmleng juanmleng merged commit 129c33c into main Jan 3, 2025
6 checks passed
@johnwalz97 johnwalz97 deleted the juan5508/sc-7864/create-credit-risk-scorecard-notebook-using-xgboost branch January 6, 2025 16:00
@cachafla cachafla added enhancement New feature or request and removed internal Not to be externalized in the release notes labels Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants